Data: World University Rankings 2023¶

About: This 2023 dataset includes 13 performance indicator measures (variables) across four areas of teaching, research, knowledge transfer and international outlook for 1,799 universities globally. The dataset includes over 680,000 data points from Times Higher Education's survey submissions from 40,000 scholars, 121 million citations, and 15.5 million research publications at over 2,500 universities.

There are 2341 observations in this dataset.

Variable Variable Type Description
1 University rank chr Rank of specific university all over the world
2 University name chr Specific name of University
3 Location chr Physical place where university exists
4 No. of students chr Present number of students enrolled in university as of 2023
5 No. of students per staff dbl Number of students under one Professor
6 International students chr Percentage of International Students
7 Female : male ratio chr A ratio of female to male students respectively
8 Overall score chr The combined weighted scores of those given below. Out of 100
9 Teaching score chr The percieved prestige of the institution based on the Academic Reputation Survey. Out of 100.
10 Research score chr Reputation for research excellence amongst peers based on the Academic Reputation Survey. Out of 100
11 Citations score chr The number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years. Out of 100.
12 Industry income score chr How much money a university receives from the working industry in exchange for its academic expertise. Out of 100
13 International outlook score chr The ability of a university to attract undergraduates, postgraduates and faculty from all over the globe.

Question¶

How do Female:Male Ratio and International Student % affect University Rank?

In [2]:
#install/load required packages
library(tidyverse)
library(broom)
library(infer)
library(base)
library(GGally)
rankings_df <- read_csv(url("https://raw.githubusercontent.com/sabrinalou/stat-301-project/main/World%20University%20Rankings%202023.csv"))

head(rankings_df)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

Rows: 2341 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (11): University Rank, Name of University, Location, International Stude...
dbl  (1): No of student per staff
num  (1): No of student

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
A tibble: 6 × 13
University RankName of UniversityLocationNo of studentNo of student per staffInternational StudentFemale:Male RatioOverAll ScoreTeaching ScoreResearch ScoreCitations ScoreIndustry Income ScoreInternational Outlook Score
<chr><chr><chr><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr>
1University of Oxford United Kingdom2096510.642%48 : 5296.492.399.799.074.996.2
2Harvard University United States 21887 9.625%50 : 5095.294.899.099.349.580.5
3University of Cambridge United Kingdom2018511.339%47 : 5394.890.999.597.054.295.8
3Stanford University United States 16164 7.124%46 : 5494.894.296.799.865.079.8
5Massachusetts Institute of TechnologyUnited States 11415 8.233%40 : 6094.290.793.699.890.989.3
6California Institute of Technology United States 2237 6.234%37 : 6394.190.997.097.389.883.6

Assignment 2: Exploratory Data Analysis and Visualization¶

Cleaning and Tidying data¶

Firstly, we need to convert all appropriate char columns to numerical so we are able to use them as continuous variable representations in visualizations. We should also rename them to have more code-readable and logical names. This includes University Rank, International Student (percentage), Female:Male Ratio, and the various scores.
All observations with "NA" values in University Rank should be removed, since these are critical variables to a visualizations. This leaves 199 universities rather than the original 2341.

In [3]:
# replacing spaces in column names with periods
names(rankings_df) <- gsub("\\s+", ".", names(rankings_df))
# changing columns to correct data types, renaming, creating new columns, and removing all NA ranking universities
rankings <- rankings_df |>
    mutate(across(c(University.Rank, Teaching.Score, OverAll.Score, Research.Score, Citations.Score, Industry.Income.Score, International.Outlook.Score), 
    as.numeric)) |>
    mutate(International.Student = as.numeric(gsub("%", "", International.Student)) / 100) |>
    rename(International.Student.Percent = International.Student) |>
    rename(Student.per.Staff = No.of.student.per.staff) |>
    rename(Students = No.of.student)  |>
    rename(Overall.Score = OverAll.Score) |>
    separate('Female:Male.Ratio', into = c("Female", "Male"), sep = " : ", convert = TRUE) |>
    mutate('Female.Male.Ratio' = Female / Male) |>
    select(-Female, -Male) |>
    filter(!is.na(University.Rank))
head(rankings)
tail(rankings)

str(rankings)
Warning message:
“There were 7 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(...)`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 6 remaining warnings.”
A tibble: 6 × 13
University.RankName.of.UniversityLocationStudentsStudent.per.StaffInternational.Student.PercentOverall.ScoreTeaching.ScoreResearch.ScoreCitations.ScoreIndustry.Income.ScoreInternational.Outlook.ScoreFemale.Male.Ratio
<dbl><chr><chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1University of Oxford United Kingdom2096510.60.4296.492.399.799.074.996.20.9230769
2Harvard University United States 21887 9.60.2595.294.899.099.349.580.51.0000000
3University of Cambridge United Kingdom2018511.30.3994.890.999.597.054.295.80.8867925
3Stanford University United States 16164 7.10.2494.894.296.799.865.079.80.8518519
5Massachusetts Institute of TechnologyUnited States 11415 8.20.3394.290.793.699.890.989.30.6666667
6California Institute of Technology United States 2237 6.20.3494.190.997.097.389.883.60.5873016
A tibble: 6 × 13
University.RankName.of.UniversityLocationStudentsStudent.per.StaffInternational.Student.PercentOverall.ScoreTeaching.ScoreResearch.ScoreCitations.ScoreIndustry.Income.ScoreInternational.Outlook.ScoreFemale.Male.Ratio
<dbl><chr><chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
194University of Miami United States 1700910.80.1654.648.433.581.048.360.21.127660
196University of Erlangen-NurembergGermany 3030343.40.1354.544.647.568.890.753.51.040816
196Sichuan University China 4954315.80.0654.557.158.648.693.438.7 NA
198Durham University United Kingdom1842514.10.3554.440.044.670.039.494.31.173913
198Queen’s University Belfast NA 1906015.80.3954.431.137.984.441.697.41.325581
198University of Reading United Kingdom1572016.40.3254.436.539.678.542.293.31.272727
tibble [199 × 13] (S3: tbl_df/tbl/data.frame)
 $ University.Rank              : num [1:199] 1 2 3 3 5 6 7 8 9 10 ...
 $ Name.of.University           : chr [1:199] "University of Oxford" "Harvard University" "University of Cambridge" "Stanford University" ...
 $ Location                     : chr [1:199] "United Kingdom" "United States" "United Kingdom" "United States" ...
 $ Students                     : num [1:199] 20965 21887 20185 16164 11415 ...
 $ Student.per.Staff            : num [1:199] 10.6 9.6 11.3 7.1 8.2 6.2 8 18.4 5.9 11.2 ...
 $ International.Student.Percent: num [1:199] 0.42 0.25 0.39 0.24 0.33 0.34 0.23 0.24 0.21 0.61 ...
 $ Overall.Score                : num [1:199] 96.4 95.2 94.8 94.8 94.2 94.1 92.4 92.1 91.4 90.4 ...
 $ Teaching.Score               : num [1:199] 92.3 94.8 90.9 94.2 90.7 90.9 87.6 86.4 92.6 82.8 ...
 $ Research.Score               : num [1:199] 99.7 99 99.5 96.7 93.6 97 95.9 95.8 92.7 90.8 ...
 $ Citations.Score              : num [1:199] 99 99.3 97 99.8 99.8 97.3 99.1 99 97 98.3 ...
 $ Industry.Income.Score        : num [1:199] 74.9 49.5 54.2 65 90.9 89.8 66 76.8 55 59.8 ...
 $ International.Outlook.Score  : num [1:199] 96.2 80.5 95.8 79.8 89.3 83.6 80.3 78.4 70.9 97.5 ...
 $ Female.Male.Ratio            : num [1:199] 0.923 1 0.887 0.852 0.667 ...

Visualizations¶

In [4]:
# a scatterplot matrix to assess linear correlations between selected variables
rankings_matrix <- rankings |>
                    select(University.Rank, Student.per.Staff, International.Student.Percent, Overall.Score, 
                           Teaching.Score, Research.Score, Citations.Score, Industry.Income.Score, 
                           International.Outlook.Score, Female.Male.Ratio) |>
                            ggpairs() + 
                            ggtitle("Figure 1. Scatterplot Matrix of 'Rankings' with Selected Variables")
rankings_matrix
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 26 rows containing non-finite values (`stat_density()`).”
No description has been provided for this image

From Figure 1, we can see that University Rank has strong negative correlations (|r| > 0.7) with Overall Score, Teaching Score, and Research Score, which is expected. Remember that this negative correlation means higher scores are associated with lower ranking, which are universities with "better" rankings. The next strongest correlations that aren't related to score are International Student Percentage, and Student per Staff.
For Female:Male Ratio, we can observe that it does not have a strong correlation with any of the variables. There are mostly negative correlations except for with Students per Staff, and Citations Score which are weak positive correlations. It has weak positive correlation with University Rank means "better" schools have lower female to male ratios.
Overall Score has strong positive correlations with Teaching and Research scores and a strong negative correlation with University Rank (as expected.) It also has positive correlations with Citations, Industry Income, and International Outlook scores with descending strength respectively. This is interesting, as we can examine which scores influence the overall score the most, and we can further explore what unexpected variables may influence each individual score category (ie. female:male ratio and international student percentage.) Overall Score has a weak negative corelation with Female:Male ratio. International Student Percentage has a strong correlation with International Outlook (as expected.) The next strongest are with Overall Score, University Rank, and the other scores. Note that there is a negative correlation with Industry Income Score and Female:Male Ratio.

From this scatter plot matrix, we should proceed by visualizing some of the explanatory variables of interest with boxplots. Boxplots are able to easily visualize the distributions of continuous variables (ie. quartiles, median, mean) across categorical variables. Therefore, I will split up the University Rank variable into 4 equally split categories based on rank to see how the relationships may behave differently when the university rankings are broken up. This allows us to assess the relationships between Female:Male Ratio, University Rank, and International Student Percentage adequately.

Furthermore, I would like to explore each categorical score's relationship with non-score variables to identify any hidden relationships that aren't explained by the correlation of Overall Score with other variables.

In [5]:
# mean and median female:male ratio calculations
ratio_mean <- mean(rankings$Female.Male.Ratio, na.rm = TRUE)
ratio_median <- median(rankings$Female.Male.Ratio, na.rm = TRUE)

# mean and median intl student % calculations
intl_mean <- mean(rankings$International.Student.Percent, na.rm = TRUE)
intl_median <- median(rankings$International.Student.Percent, na.rm = TRUE)

# mean and median overall score calculations
overall_mean <- mean(rankings$Overall.Score, na.rm = TRUE)
overall_median <- median(rankings$Overall.Score, na.rm = TRUE)

# University.Rank column as a factor for boxplots
rank_categories <- cut(rankings$University.Rank,
                       breaks = c(1, 51, 101, 151, 201),
                       labels = c("1 to 50", "51 to 100", "101 to 150", "151 to 200"),
                       include.lowest = TRUE)
rankings_factored <- mutate(rankings, University.Rank = as.factor(rank_categories))

# ranking boxplots

# female:male ratio boxplot
ggplot(data = rankings_factored, aes(x = University.Rank, y = Female.Male.Ratio)) +
  geom_boxplot() +
  labs(x = "University Rank", y = "Female:Male Ratio") +
  ggtitle("Figure 2A. Boxplot of Female:Male Ratio across Rank") +
    geom_hline(yintercept = ratio_median, color = "red") +
    geom_text(aes(x = 4,
                  y = ratio_median - 0.1, 
                label = "Median"), 
            color = "red", hjust = 0, vjust = -1, size = 3) +
   theme(
     text = element_text(size = 14),
     plot.title = element_text(size = 12, face = "bold"),
     axis.title = element_text(face = "bold")
   )

# intl student % boxplot
ggplot(data = rankings_factored, aes(x = University.Rank, y = International.Student.Percent)) +
  geom_boxplot() +
  labs(x = "University Rank", y = "International Student %") +
  ggtitle("Figure 2B. Boxplot of International Student % across Rank") +
    geom_hline(yintercept = intl_median, color = "red") +
    geom_text(aes(x = 4,
                  y = intl_median, 
                label = "Median"), 
            color = "red", hjust = 0, vjust = -1, size = 3) +
    geom_hline(yintercept = intl_mean, color = "blue") +
    geom_text(aes(x = 1,
                  y = intl_mean - 0.03, 
                label = "Mean"), 
            color = "blue", hjust = 0, vjust = -1, size = 3) +
   theme(
     text = element_text(size = 14),
     plot.title = element_text(size = 12, face = "bold"),
     axis.title = element_text(face = "bold")
   )
Warning message:
“Removed 26 rows containing non-finite values (`stat_boxplot()`).”
No description has been provided for this image
No description has been provided for this image

From the series of Figure 2 boxplots, we notice that Female:Male Ratio fluctuates across the University Ranking groups, addressing a relationship to be further explored that is potentially more complex than a linear relationship. With International Student Percentage and Overall Score, there were steady distributions that aligned with the correlation values we saw in Figure 1.

Score Exploration¶

The specific scatterplot matrices of the score categories below explore their correlations with our variables of interest that contribute to the Overall Score. Exploration of these relationships may reveal underlying effects from our variables of interest and Overall Score, and thus University Rank of universities.

In [6]:
score_columns <- grep(".Score", names(rankings), value = TRUE)

scores_matrix <- function(df, x_var, y_vars) {
  title <- paste("Scatterplot Matrix for", x_var)
  ggpairs(df, columns = c(x_var, y_vars),
          lower = list(continuous = "points"),
          diag = list(continuous = "blank")) +
    ggtitle(title)
}
scores_matrices <- lapply(score_columns, function(score_col) {
    scores_matrix(rankings, score_col, c("Female.Male.Ratio", "International.Student.Percent"))
})

# Display the scatterplot matrices and figure title
print("Figure 3. Scatterplot Matrices for Score Categories vs Female:Male Ratio and International Student %")
scores_matrices
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
Warning message in check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", :
“Changing diag$continuous from 'blank' to 'blankDiag'”
[1] "Figure 3. Scatterplot Matrices for Score Categories vs Female:Male Ratio and International Student %"
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
Warning message in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
“Removed 26 rows containing missing values”
Warning message:
“Removed 26 rows containing missing values (`geom_point()`).”
[[1]]

[[2]]

[[3]]

[[4]]

[[5]]

[[6]]
No description has been provided for this image
No description has been provided for this image

Assignment 3: Methods and Plans¶

This section proposes a method to address the research question: How do Female:Male Ratio and International Student % affect University Rank? We will explore this by examining the relationships between Female:Male Ratio, International Student %, the categorical score variables, Overall Score, and University Rank for the first 199 institutions in the dataset.

Proposed Method: Multiple Linear Regression with Interaction Term¶

Multiple linear regression (MLR) will be a good method for the analyses we must do. MLR is appropriate for this study because:

  • It allows examination of the relationship between multiple independent variables (Female:Male Ratio, International Student %, Teaching Score, Citation Score, Research Score, Industry Income Score, International Outlook Score, and Overall Score) and a single dependent variable, University Rank.
  • Modeling Continuous Outcome: The categorical scores, ratio, and percentages are continuous variables. Linear regression is able to accurately model the relationship between continuous independent variables and a continuous dependent variable usually.
  • Interaction between explanatory variables: Including interaction terms in the model allow us to analyze the relationships between Female:Male Ratio, International Student % and the categorical score variables. A statistically significant interaction term would indicate establish evidence of the effect of Female:Male Ratio and International Student % on how the Overall Score and thus University Rankings are obtained.

Limitations of Linear Regression:¶

The model's predictions operate on the assumption that the relationship between the variables is linear, when in reality they might not be.

Outliers: Outliers in the dataset can significantly impact the results of linear regression and lead to inaccurate results.

Multicollinearity: If the independent variables are highly correlated, it can lead to unstable coefficient estimates. We can check for multicollinearity using correlation analysis and variance inflation factors (VIF).

Assignment 4: Computational Code and Output¶

Implementation of a Proposed Model¶

In [18]:
print("Table 1A. University Rank Additive Model with Categorical Scores as Covariates")
rankings_mlr <- lm(University.Rank ~ Teaching.Score + Research.Score + Citations.Score + Industry.Income.Score + International.Outlook.Score, data = rankings) |> 
    tidy(0.95) |>
    mutate_if(is.numeric, round, 2)
rankings_mlr

print("Table 1B. University Rank Interaction Model with Major Categorical Scores, Female:Male Ratio, and International Student % as Covariates")
rankings_mlr_int <- lm(University.Rank ~ Teaching.Score*Research.Score*Citations.Score*Female.Male.Ratio*International.Student.Percent, data = rankings) |> 
    tidy(0.95) |>
    mutate_if(is.numeric, round, 2)
head(rankings_mlr_int, n = 16)
[1] "Table 1A. University Rank Additive Model with Categorical Scores as Covariates"
A tibble: 6 × 7
termestimatestd.errorstatisticp.valueconf.lowconf.high
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
(Intercept) 422.9915.1227.970.00393.16452.82
Teaching.Score -0.93 0.24-3.840.00 -1.40 -0.45
Research.Score -1.93 0.22-8.630.00 -2.37 -1.49
Citations.Score -1.39 0.15-9.080.00 -1.69 -1.09
Industry.Income.Score -0.18 0.09-2.150.03 -0.35 -0.02
International.Outlook.Score -0.40 0.10-4.150.00 -0.58 -0.21
[1] "Table 1B. University Rank Interaction Model with Major Categorical Scores, Female:Male Ratio, and International Student % as Covariates"
A tibble: 16 × 7
termestimatestd.errorstatisticp.valueconf.lowconf.high
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
(Intercept) 3189.711341.18 2.380.02 538.29 5841.12
Teaching.Score -50.81 23.20-2.190.03 -96.68 -4.94
Research.Score -26.79 21.49-1.250.21 -69.28 15.69
Citations.Score -30.72 15.07-2.040.04 -60.52 -0.93
Female.Male.Ratio -1701.801263.26-1.350.18 -4199.17 795.57
International.Student.Percent -9494.445313.29-1.790.08-19998.45 1009.57
Teaching.Score:Research.Score 0.49 0.33 1.500.14 -0.16 1.13
Teaching.Score:Citations.Score 0.55 0.27 2.070.04 0.02 1.08
Research.Score:Citations.Score 0.22 0.24 0.930.35 -0.25 0.70
Teaching.Score:Female.Male.Ratio 35.71 21.71 1.640.10 -7.21 78.64
Research.Score:Female.Male.Ratio 14.67 20.99 0.700.49 -26.83 56.17
Citations.Score:Female.Male.Ratio 19.86 14.14 1.410.16 -8.08 47.81
Teaching.Score:International.Student.Percent 168.79 91.13 1.850.07 -11.38 348.96
Research.Score:International.Student.Percent 86.02 84.32 1.020.31 -80.67 252.72
Citations.Score:International.Student.Percent 104.04 59.38 1.750.08 -13.35 221.43
Female.Male.Ratio:International.Student.Percent 9382.354781.72 1.960.05 -70.7818835.48

In Table 1A, it is shown that the Teaching, Research, and Citations Scores have significant p-values < 0.05, where their increases are associated with a decrease in the response variable (ie. a higher ranking). Due to their relatively larger effects on University Ranking, they are chosen as the variables to include in the interaction model. In Table 1B, it is observed that neither Female:Male Ratio or International Student % have any statistically significant interaction effects with Teaching, Research, or Citations Scores. However, there is a significant (p=0.05) interaction between Female:Male Ratio and International Student %. This means that as the Female:Male Ratio increases, the positive effect of International Student % on University Rank becomes stronger.

In [ ]: